large-scale multipurpose benchmark dataset
Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems
Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback. RS models evaluated on such datasets often lack practical values for large-scale real-world applications. In this paper, we describe Tenrec, a novel and publicly available data collection for RS that records various user feedback from four different recommendation scenarios. To be specific, Tenrec has the following five characteristics: (1) it is large-scale, containing around 5 million users and 140 million interactions; (2) it has not only positive user feedback, but also true negative feedback (vs.
Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems
Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback. RS models evaluated on such datasets often lack practical values for large-scale real-world applications. In this paper, we describe Tenrec, a novel and publicly available data collection for RS that records various user feedback from four different recommendation scenarios. To be specific, Tenrec has the following five characteristics: (1) it is large-scale, containing around 5 million users and 140 million interactions; (2) it has not only positive user feedback, but also true negative feedback (vs. We verify Tenrec on ten diverse recommendation tasks by running several classical baseline models per task.
Large-Scale Multipurpose Benchmark Datasets For Assessing Data-Driven Deep Learning Approaches For Water Distribution Networks
Tello, Andres, Truong, Huy, Lazovik, Alexander, Degeler, Victoria
Currently, the number of common benchmark datasets that researchers can use straight away for assessing data-driven deep learning approaches is very limited. Most studies provide data as configuration files. It is still up to each practitioner to follow a particular data generation method and run computationally intensive simulations to obtain usable data for model training and evaluation. In this work, we provide a collection of datasets that includes several small and medium size publicly available Water Distribution Networks (WDNs), including Anytown, Modena, Balerma, C-Town, D-Town, L-Town, Ky1, Ky6, Ky8, and Ky13. In total 1,394,400 hours of WDNs data operating under normal conditions is made available to the community.
- North America > United States > Kentucky (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (2 more...)